Improving the Accuracy of Pre-trained Word Embeddings for Sentiment Analysis
نویسندگان
چکیده
Sentiment analysis is one of the well-known tasks and fast growing research areas in natural language processing (NLP) and text classifications. This technique has become an essential part of a wide range of applications including politics, business, advertising and marketing. There are various techniques for sentiment analysis, but recently word embeddings methods have been widely used in sentiment classification tasks. Word2Vec and GloVe are currently among the most accurate and usable word embedding methods which can convert words into meaningful vectors. However, these methods ignore sentiment information of texts and need a huge corpus of texts for training and generating exact vectors which are used as inputs of deep learning models. As a result, because of the small size of some corpuses, researcher often have to use pre-trained word embeddings which were trained on other large text corpus such as Google News with about 100 billion words. The increasing accuracy of pre-trained word embeddings has a great impact on sentiment analysis research. In this paper we propose a novel method, Improved Word Vectors (IWV), which increases the accuracy of pre-trained word embeddings in sentiment analysis. Our method is based on Part-of-Speech (POS) tagging techniques, lexicon-based approaches and Word2Vec/GloVe methods. We tested the accuracy of our method via different deep learning models and sentiment datasets. Our experiment results show that Improved Word Vectors (IWV) are very effective for sentiment analysis.
منابع مشابه
emoji2vec: Learning Emoji Representations from their Description
Many current natural language processing applications for social media rely on representation learning and utilize pre-trained word embeddings. There currently exist several publicly-available, pre-trained sets of word embeddings, but they contain few or no emoji representations even as emoji usage in social media has increased. In this paper we release emoji2vec, pre-trained embeddings for all...
متن کاملRefining Word Embeddings for Sentiment Analysis
Word embeddings that can capture semantic and syntactic information from contexts have been extensively used for various natural language processing tasks. However, existing methods for learning contextbased word embeddings typically fail to capture sufficient sentiment information. This may result in words with similar vector representations having an opposite sentiment polarity (e.g., good an...
متن کاملSupplementary Materials for Select-Additive Learning: Improving Generalization in Multimodal Sentiment Analysis
We extracted an embedding for each word in the text sentence of the utterance using a word2vec dictionary pre-trained on a Google News corpus [3]. The text input of each utterance was formed by concatenating the word embeddings for all the words in the sentence and padding them with the appropriate zeros to have the same dimension. We set the maximum length as 60 and discard additional words (o...
متن کاملConvolutional Neural Networks for Sentiment Classification on Business Reviews
Recently Convolutional Neural Networks (CNNs) models have proven remarkable results for text classification and sentiment analysis. In this paper, we present our approach on the task of classifying business reviews using word embeddings on a large-scale dataset provided by Yelp: Yelp 2017 challenge dataset. We compare word-based CNN using several pre-trained word embeddings and end-to-end vecto...
متن کاملBB_twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and LSTMs
In this paper we describe our attempt at producing a state-of-the-art Twitter sentiment classifier using Convolutional Neural Networks (CNNs) and Long Short Term Memory (LSTMs) networks. Our system leverages a large amount of unlabeled data to pre-train word embeddings. We then use a subset of the unlabeled data to fine tune the embeddings using distant supervision. The final CNNs and LSTMs are...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1711.08609 شماره
صفحات -
تاریخ انتشار 2017